Indexing Structures for Approximate String Matching
نویسندگان
چکیده
In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.
منابع مشابه
Indexing Methods for Approximate String Matching
Indexing for approximate text searching is a novel problem receiving much attention because of its applications in signal processing, computational biology and text retrieval, to name a few. We classify most indexing methods in a taxonomy that helps understand their essential features. We show that the existing methods, rather than completely diierent as they are regarded, form a range of solut...
متن کاملA Hybrid Indexing Method for Approximate String Matching
We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the average retrieval time is , for some that depends on the error fraction tolerated and the alphabet size . It is shown that for approximately , where . The space required is four times...
متن کاملA New Indexing Methodfor Approximate String Matching ? Gonzalo
We present a new indexing method for the approximate string matching problem. The method is based on a suux tree combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the retrieval time is O(n), for 0 < < 1, whenever < 1 ? e= p , where is the error level tolerated and is the alphabet size. We experimentally show that this index outperforms by far all othe...
متن کاملApproximate String Matching ? Edgar
We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suux tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us nding the R occurrences of ...
متن کاملFinding Approximate Matches in Large Lexicons
Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures an...
متن کامل